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Abstract 

Multigrid  convergence  rates  degenerate  on.  problems  with  stretched  grids  or 
anisotropic  operators,  unless  one  uses  line  or  plane  relaxation.  For  three  dimen¬ 
sional  problems,  only  plane  relaxation  suffices,  in  general.  While  line  and  plane 
relaxation  algorithms  are  efficient  on  sequential  machines,  they  are  quite  awk¬ 
ward  and  inefficient  on  parallel  machines.  This  paper  presents  a  new  multigrid 
algorithm,  based  on  the  use  of  multiple  coarse  grids,  that  eliminates  the  need  for 
line  or  plane  relaxation  in  anisotropic  problems.  We  develop  this  algorithm,  and- 
extend  the  standard  multigrid  theory  to  establish  rapid  convergence  for  this  class 
of  algorithms.  The  new  algorithm  uses  only  point  relaxation,  allowing  easy  and 
efficient  parallel  implementation,  yet  achieves  robustness  and  convergence  rates 
comparable  to  line  and  plane  relaxation  multigrid  algorithms. 

The  algorithm  described  here  is  a  variant  of  Mulder’s  multigrid  algorithm 
[5]  for  hyperbolic  problems.  The  latter  uses  multiple  coarse  grids  to  achieve 
robustness,  but  is  unsuitable  for  elliptic  problems,  since  its  V-cycle  convergence 
rate  goes  to  one  as  the  number  of  levels  increases.  The  new  algorithm  combines 
the  contributions  from  the  multiple  coarse  grids  via  a  local  “switch,”  based  on  the 
strength  of  the  discrete  operator  in  each  coordinate  direction.  This  improvement 
allows  us  to  show  that  the  V-cyclc  convergence  rate  is  uniformly  bounded  away 
from  one,  on  model  anisotropic  problems.  Moreover,  the  new  algorithm  can  be 
combined  with  the  idea  of  concurrent  iteration  on  all  multigrid  levels  to  yield  a 
highly  parallel  algorithtn  for  strongly  anisotropic  problems. 
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1  Introduction 


As  is  well  known,  the  convergence  rate  of  multigrid  algorithms  based  on  point  relaxation 
smoothers  degenerates  on  problems  exhibiting  strong  anisotropies.  Thus  line  or  plane  re¬ 
laxation  in  each  of  the  coordinate  directions  is  often  needed  to  obtain  good  multigrid  con¬ 
vergence  rates.  Anisotropic  discrete  operators  arise  in  problems  in  which  the  diflerential 
operator  exhibits  stronger  coupling  in  some  coordinate  directions  than  in  others,  or  when 
the  discretization  is  based  on  highly  stretched  grids  having  mesh  aspect  ratios  far  from  unity. 
For  such  problems,  line  relaxation  typically  suffices  in  two  dimensional  problems,  while  for 
three  dimensional  problems  plane  relaxation  is  often  required.  With  standard  (full  coarsen¬ 
ing)  multigrid  in  three  dimensions,  plane  relaxation  in  each  of  the  coordinate  directions  is 
required  in  general  [6]. 

Plane  relaxation  is  very  expensive,  especially  for  systems  of  equations.  Given  this,  interest 
has  focused  recently  on  “semicoarsening”  algorithms,  in  which  the  plane  relaxation  is  carried 
out  in  only  one  direction,  while  the  grid  is  “coarsened”  only  in  the  direction  orthogonal  to 
these  planes  [1].  The  required  plane  solves  can  then  be  done  recursively,  via  an  analogous 
two  dimensional  algorithm,  based  on  line  relaxation  in  one  direction  and  coarsening  in  the 
orthogonal  direction. 

While  such  “semicoarsening”  algorithms  are  fast  and  effective  on  sequential  architectures, 
they  have  limited  and  awkward  parallelism.  The  recursive  solution  of  two  dimensional  prob¬ 
lems,  required  in  the  plane  relaxation,  takes  0(log^  A^)  parallel  operations,  so  that  it  takes 
0(Iog^  N)  parallel  operations  per  three  dimensional  V-cycle,  where  N  is  the  number  of  mesh 
points.  Thus  it  takes  time  at  least  0(log'  N)  to  converge  to  truncation  error. 

An  alternative  way  of  achieving  robustness,  which  avoids  line  and  plane  relaocations  al¬ 
together,  is  to  use  multiple  coarse  grids  formed  by  semicoarsening  in  each  of  the  coordinate 
directions,  as  in  Mulder’s  hyperbolic  algorithms  [5].  In  this  paper,  a  modification  of  Mulder’s 
method  is  proposed,  which  substantially  improves  the  convergence  properties  of  his  method, 
when  applied  to  elliptic  problems.  This  enables  one  to  design  effective  V-cycle  elliptic  solvers 
for  anisotropic  problems,  using  only  point  relaxation  smoothers.  This  new  class  of  methods 
is  shown  to  be  simple,  robust,  and  effective. 

One  can  also  construct  a  highly  parallel  algorithm  for  anisotropic  problems  by  combining 
the  new  algorithm  with  the  idea  of  concurrent  iteration  on  all  multigrid  levels  (2).  This  paper 
gives  numerical  experiments  suggesting  the  efficacy  of  this  approach,  though  wc  have  yet  lo 
explore  the  use  of  this  algorithm  on  parallel  machines. 

2  Algorithm  Design 

.Strong  coupling  of  the  operator  in  a  particular  direction  can  easily  degrade  the  performance 
of  a  multigrid  method.  There  are  several  ways  of  accelerating  convergence  in  the  case  of  such 
anisotropy.  Line  relaxation  and  semicoarsening  melhod.s  can  be  used  lo  correct,  respectively, 
the  inability  of  the  relaxation  method  to  solve  for  some  high  frequencies,  and  the  inability 
of  the  standard  coarse  grid  to  represent  high  frequencies.  In  the  line  relaxation  method,  the 
ineffective  point  relaxation  is  replaced  by  line  relaxation  in  the  direction  of  strong  coupling. 
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By  removing  the  residual  components  due  to  the  strong  coupling,  the  remaining  residual 
(lu<?  to  weak  coupling  in  the  otiicr  direction  can  be  cfTcctively  smoothed  by  the  relaxation. 

I  bus  line  relaxation  together  with  standard  coarsening  is  sufTicicnt  to  uniformly  reduce  all 
I'ourier  components,  in  two  dimensional  problems.  In  the  second  approach,  instead  of  using 
line  relaxations,  the  grid  is  coarsened  only  in  the  direction  of  strongest  coupling.  In  this 
case,  point  relaxation  together  with  semicoarsening  suffices  to  uniformly  reduce  all  Fourier 
components. 

In  addition  to  line  relaxation  and  semicoarsening,  other  methods  have  been  proposed  that 
more  aggressively  solve  for  the  difficult  frequencies.  Hackbusch’s  llobust  Parallel  Multigrid 
[3]  uses  ‘forced  aliasing’  to  represent  high  frequency  components  on  standard  coarse  grids. 
'I'lie  lijgli  frequency  components  of  the  residual  arc  aliased  to  low  frequencies,  solved  for  on 
a  coarse  grid,  and  then  the  coarse  grid  correction  is  “de-aliased”  back  to  the  high  frequency. 
Although  this  method  uses  point  relaxations  and  standard  coarsening,  it  requires  the  use  of 
multiple  coarse  grids,  each  with  a  different  discrete  operator,  and  is  thus  quite  complex. 

In  this  paper  we  will  look  at  a  natural  extension  of  the  second  approach,  use  of  semi- 
coarsening.  This  approach  was  originally  proposed  by  Mulder  [5]  for  overcoming  the  problem 
of  alignment  in  fluid  flow  computations.  The  simple  technique  of  semicoarsening  simulta- 
neou.sly  in  all  coordinate  directions,  and  properly  weighting  the  contributions  from  each  of 
the  coarse  grids,  yields  an  efficient,  robust,  and  easily  parallelizablc  multigrid  method  for 
general  tensor  product  grid. 


3  The  Algorithm 


The  multiple  semicoarse  grid  (MSG)  correction  scheme  (for  linear  problems)  is  similar  to 
the  standard  multigrid  correction  scheme,  except  that  there  arc  now  extra  grids  involved.  In 
two  dimensions  every  grid  is  simultaneously  coarsened  in  two  directions. 

We  first  suppose,  for  simplicity,  that  the  domain  of  the  model  boundary  problem  is  the 
unit  square,  and  that  this  problem  is  to  be  solved  on  an  N  x  N  uniform  grid  given  by 

nk  =  {iih,jh)\  i  =  0,\ . N-U  ;=0,1 . yV-1}, 


where  li  =  l/N  and  N  is  a  power  of  two.  Let  the  subgrid,  fl"*'",  obtained  by  successively 
semi  coarsening  f]/,,  be  the  grid  with  N/2"'  grid  points  in  the  x  direction  and  A^/2”  grid 
points  in  the  y  direction. 

Notice  that  the  notation  does  not  distinguish  Ix'tween  a  grid  obl<\ined  by  semicoarsoning 
first  iti  (he  y  direction  and  then  in  the  x  direction  and  a  grid  obtained  by  semicoarsening  first 
in  the  x  direction  and  then  in  the  y  din’ction.  As  shown  in  Mulder,  in  order  to  con.slruct 
rea.sonable  algorithms  in  three  or  more  dimensions,  I  he  problems  on  wpiivalent  grids  must  be 
cofribinerl,  I'igiire  I  shows  the  interrelations  between  the  various  grids  for  a  two  dimensional 
problem  with  an  8x8  fine  grid.  With  cf)ar.se  grids  combined  as  in  this  diagram,  one  has  a 
oidy  If)  grids  altogether,  while  without  combining  (he  full  binary  Irtx’of  grids  would  contain 
(')'.)  grids  and  hav»!  no  real  numerical  advanlagt'. 

Now  introducing  more  notaf  ion,  the  rliscrelc'  fapialions  on  grid  are  written  as: 


I  rn  I  j 


F 


cn 


riu!  operators  can  be  thought  of  as  either  r'.iscretizatione  of  the  differential  operator, 
A.  on  the  grid  ()"*•”,  or  as  oi)crator8  obtained  Vuriationaily  from  the  fine  grid  and  inteigrid 
transfer  operators. 

A  k  grid  (N  =  2*)  V-cycIc  for  this  method  is  performed  in  three  parts.  In  part  «  the 
information  is  propagated  from  the  fine  grids  to  the  coarse  grids,  in  part  h  the  equations  are 
solved  on  tiie  coarsest  grid,  and  in  part  c  the  information  is  propagated  back  from  the  coarw* 
grids  to  the  fine  grids. 

MSG  algorithm 

fl.  For  /  =  0,l,2,...,2ik-  1: 

For  all  m  >  0,  n  >  0  such  that  m  +  n  =  1: 

1.  If  /  >  0,  combine  restricted  residuals  on  ft”*'" 

2.  Relax  Ui  times  on  the  ft"*'"-grid  equations 

3.  If  m  <  k^  transfer  (restrict)  residual  from  ft"*'"  to  ft'"+'’" 

4.  U  n  <  k,  transfer  (restrict)  residual  from  ft"*'"  to  ft”‘>"+’ 

b.  For  /  =  2k: 

1.  Combine  restricted  residuals  on  ft**'* 

2.  Solve  (1)  by  any  direct  or  iterative  method  on  ft*'* 

3.  Transfer  (interpolate)  correction  from  ft*'*  to  ft*"*’*  and  ft*’*"* 

c.  For  /  —  2A:  —  1,2A:  —  2, . . . ,  1,0: 

For  all  m  >  0,  n  >  0  such  that  m  +  n  =  1: 

1.  If  /  >  0,  combine  interpolated  corrections  on  ft”*’" 

2.  Relax  1/2  times  on  the  ft”*’"-grid  equations 

3.  If  m  >  0,  transfer  (interpolate)  correction  from  ft"*’"  to  ft"*"*’" 

4.  If  n  >  0,  transfer  (interpolate)  correction  from  ft"**"  to  ft"*'""* 


Any  point  relaxation  on  equation  (1)  can  be  used  in  steps  fl2  and  c2.  For  steps  n3,  n4, 
c3,  and  c4  we  consider  intergrid  transfer  operators  which  arc  one  dimensional.  For  example, 
the  residual  restriction  operators  could  be  the  usual  three  point  averaging  formulas,  given 
by  the  stencils: 


MM  + 1  ,n 
'  fri»n 


/m,n+  I 
'  m.ri 


f  1  I  «  ]"*+’•" 

\  *  '•*  <  Im.n 

.  m.n+l 
A 

2 

I 

•  A  ■*  *ii.w 


.3 


Figure  1:  Semicoarwjning  of  an  8  x  8  grid 


(8,8) 

/  \ 

(8,4)  (4,8) 

/  \  /  \ 

(8,2)  (4.4)  (2,8) 

/  \  /  \  /  \ 

(8,1)  (4,2)  (2.4)  (1,8) 

\  /  \  /  \  / 


(4.1)  (2.2)  (1.4) 


(1.1) 


Similarly,  one  dimensional  linear  interpolation  in  the  direction  of  grid  refinement  could  be 
used  to  bring  the  corrections  from  coarse  to  fine  grids. 

We  now  look  at  the  details  of  step  al,  in  which  the  residuals  on  must  be  combined, 
and  step  cl,  in  which  the  corrections  on  0”'’"  must  be  combined.  The  restricted  residuals 
arc  simply  averaged.  Specifically, 

f  )  if  m  >  0,  n  >  0, 


m.n  _  )  jm,n  m-l,n 


if  n  =  0, 


^m,n-r 


if  m  =  0. 


A  weighted  average  of  the  interpolated  corrections  is  used,  so  that 

(  /  ,,m  +  l,n  I  ,  ,m,n  rm.n  ,,m,n+l  If  ^  ^  1.  ^  ^  I. 

+  ^m.n+l“  ll  m  <  fc,  H  <  Af, 


,,m.n  _  J  rm,n  m  +  l,n 


if  n  =  it, 


Tm,n  m,n+l 
■'m,n+l“ 


if  m  =  A*. 


'I'he  MSG  algorithm  can  lead  to  multigrid  convergence  rales  independent  of  mesh  sisse, 
provided  the  weights  uj\  and  wj  are  chosen  properly. 
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4  Convergence  Theory 


In  this  section,  we  give  onr  convergence  proof  for  the  MSG  method  for  a  constant  coefficient 
nio<!cI  problem  in  two  dimensions,  and  derive  sufficient  conditions  on  the  weights.  These 
coFiditions  are  then  used  in  the  next  section  to  motivate  the  choice  of  weights  for  the  variable 
coefficient  problem. 

In  order  to  show  that  the  convergence  rate  of  the  MSG  method  is  independent  of  mesh 
size  for  linear,  constant  coefficient  model  problems,  we  make  the  following  assumptions; 


Al.  The  coarse  grid  operators  are  Galerkin,  or  ‘variational’. 

A2.  The  restriction  and  projection  operators  are  adjoints  of  each  other  and  are  one 
dimensional. 

A3.  The  discretized  operator  is  symmetric  positive  definite, 

Aa,  The  linear  part  of  the  smoother  and  the  discrete  operator  commute. 

We  model  both  our  analysis  and  our  notation  after  that  in  [4],  although  some  notational 
changes  are  needed  in  order  to  keep  track  of  the  multiple  grids  on  each  level.  In  particular, 
it  is  more  convenient  here  to  label  grid  levels  in  the  reverse  order,  so  that  the  grid  level 
increases  as  the  grid  becomes  coarser,  contrary  to  the  standard  convention.  If  the  two  grid 
directions  are  to  be  coarsened  a  maximum  of  m  times  in  the  first  direction  and  n  times  in 
the  second  direction  (0  <  m  <  m  and  0  <  n  <  n)  then  the  coarsest  level  will  be  given  by 
/  =  m  +  n  and  the  coarsest  grid  will  be  given  by  the  indices  m,fi. 

We  are  looking  for  the  solution  of 

40.0^0,0  ^  y0.0 

where  /4“’°  is  symmetric  positive  definite.  For  each  coarser  grid  level,  /  =  1, . . . ,  f,  we  recur¬ 
sively  define  each  of  the  operators  /I"*'",  for  m  -|-  n  =  /, 


/im  — l.n 

•'m-l.n''*  'm.n 


/”i,n  4m,n-l 
t  'm.n— I'*  ^m.n 


if  m  >  0 
if  m  =  0 


Note  that  if  m  and  n  are  both  positive  then  there  are  two  ways  to  construct  the  coarse 
grid  operatok’s  from  the  fine  gri<i  operator.  However,  since  the  intergrid  operators  are  one- 
dimensional, 

rm.n  rm.n-l  _ /m.n  tm-l.n 

'm.n-l 'm  — l.n-I  tn-l.n  —  l  ’ 

and  therefore  either  way  gives  the  same  n’.sult. 

Our  notation  is  as  follows,  Kach  of  the  /!"•'“  are  operators  on  a  finite  diriiensiona]  space, 
.Since  we  will  only  be  looking  at  two  gri<l  levels  al  a  time,  we  simplify  the  grid  indices 


=;  m,7i 


hy  t  he  shortliand  notation: 

k 

ki  =  rjt  +  1 ,  n 
ki  -  m,  77  4-  1 

\\/'  also  define  the  inner  products 

{n\v^)k 

t 

for  any  in  //^'.  The  second,  ‘energy’,  inner  product  induces  a  norm  on  //*,  which  we 

denote  by  || .  H*;,  thus 

ll■'‘ll^  =  [ft  A  = 

We  also  define  four  subspaces  of  each  space  //^  as  follows.  For  i  =  1,2: 

=  R{n,) 

ff  =  {n'‘€ //'‘i  [u^w^]Jt  =  0  for  all  ly*  €  5^} 

Corresponding  to  these  subspaccs,  we  define  projection  operators,  and  Si  such  that 

R{T^)  =  ff^ 
kcr(Tf)  =  S,‘ 

S?  =  I-T^. 

for  i  =  1,2. 

With  the  above  notation,  we  are  now  ready  to  discuss  the  V-cycle  convergence  analysis. 
The  approximation  to  the  solution  of  the  fcth  grid  equations  is  updated  three  times  per  V- 
cyclc;  once  after  the  i/j  relaxation  sweeps  in  step  A2,  once  after  the  coarse  grid  correction  in 
step  Cl  and  once  after  the  1^2  relaxation  sweeps  in  step  c2.  We  label  the  initial  approximation 
as  and  the  approximations  after  each  of  the  three  updates  as  for  i  =  ,1,2  and  3.  If 
Q  denotes  the  relaxation  operator,  then  the  updates  are  given  by 

4.)  = 

"f:,)  - 

where 

We  make  two  additiotial  <issumptions  in  ord<M*  to  simplify  the  case  when  two  grids  on  the 
same  grid  level  are  semicoarsened  in  opposite  directions,  yielding  coarse  grids  of  the  same 
dimensions.  Recall  that  the  coarse  grid  problems  on  these  coarse  grids  are  combined  to  form 
a  single  problem. 

« 


A5.  Th<!  initial  approximation  i.s  «*(|iial  to  zero  on  all  i  xeept  tin*  fifiost  level; 

f***"  flll  0, 0 

At).  There  is  no  smoothing  in  the  finc-to-coarse  part  of  the  V-cyclc: 

i/i  —  0 

These  ivssumptions  guarantee  that  the  residuals  from  both  grids  are  identical,  as  shown  in 
the  following  lemma. 


Lemma  1  For  m,  n  with  0  <  m  <  ni,  0  <  n  <  h: 


jm,n  fm— l.n  _  /nt.n  rm,n— I 


Proof: 

The  lemma  is  proved  by  a  simple  induction  argument  on  the  grid  level  /,  using  the 
additional  assumptions  A5  and  A6,  together  with  Equations  (2)  and  (3).  I 


A  standard  multigrid  V-cyclc  convergence  result  can  be  based  on  a  single  assumption, 
which  combines  both  the  smoothing  and  the  approximation  properties  of  the  problem, 
namely 


>  1  +/? 


,  Wf'Hl  ) 


Wi^^vWl 

for  all  V  €  //*■'  and  for  all  grid  levels  k.  See  [^l].  Here  F*'  is  the  linear  part  of  the  smoothing 
operator  Wo  assume  sufficient  regularity  and  take  a  =  1.  Then  if  we  define  the  multigrid 
convergence  rate  on  the  ki\\  grid  level  as 

=  inf{|lu^'  -  <  fc'llu''  -  v%  for  all  u''  € 

the  V-cycle  convergence  theorem  for  standard  multigrid  algorithms  is 


Theorem  1  (Standard)  Let  k  >  2  and  suppose  wc  have  already  bounded  c*”*  by 

'I'licn 


1  +/?’ 


1  +  ir 


In  our  case,  in  which  we  have  multiple  coarse  grids  on  c*ach  grid  level,  wc  can  prow  a 
similar  result,  In  fact,  the  convergence  of  the  MSC5  V-rycIc  algorithm  can  be  as  good  a.s  the 
convergence  of  a  standard  V-cycle  multigrid  in  which  cvefy  grid  is  somicoarsoned  only  in  the 
optimal  direction. 
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Theorem  2  (MSG)  Suppose  that 


for  all  V  €  and  for  all  grid  levels  k,  and  choose  the  weights,  lOj  andu^,  so  that  ' 

.  .fe  _  _A_ 


If 


then 


Proof; 

First  wc  note  that 


max 


U>I  +  i02  ~  1  • 


(5) 


We  denote  the  'irrors  corresponding  to  the  updates  by  c*,j  for  i  =  0, 1,2  and  3,  where 


-k  _  ,,k  ,,k 

C(,)  -  U  -  U(,) 

and  where  is  the  exact  solution  of  the  ^•th  grid  equations.  Using  Lemma  1  wc  see  that, 
for  both  i  =  1  and  i  =  2, 

By  our  assumptions  wc  also  have 

C(i)  =  C(o).  (6) 

Combining  these  results,  wc  can  write 


+  u.,7'*)c‘„  +  w,  efi)  + 

Then  the  error  before  and  after  the  //j  post-relaxation  sweeps  is 


-(3) 


=  FS 


(2)- 


Since  wc  need  to  look  at  only  two  grid  levels  at  a  time,  we  will  temporarily  suppress  the 
notational  references  to  the  current  grid,  k.  Thus,  wc  define,  for  r  =  1,2  and  j  =  0, 1,2,3. 


- 

e^'  22= 

'(>) 


■0»i 


and  so  on. 


Consider  an  arbitrary  (‘lement,  o,  of  //*',  and  let  y  =  F%k  Then 

[C(3),0]fc  =  [fi(2),2/] 

=  [{ui\T\  +  u;a73)c(o)  +  wi/*,  yj 

=  a;,  ([7\c(i,,T,2/]  +  Sty])  +  (lT,e(t),T,y]  + 

By  the  Cauchy-Schwartz  inequality, 

Ihj),  Wl  <  (r,C|„||  P\y\\  +  £,||.S',«(„II  Ii5,!,||)  +  u,,  (||r,e„)||  \\T-,y\\  +  «2ll«2e(i)ll  ll^^vll)  . 

since,  for  i  =  1, 2, 


Using  the  Cauchy-Schwartz  inequality  one  more  time  gives  us 

Ih3|,.3lp  <  (u.,(||7’ie|,|ir  +  l|S>£(.)ll’)  +«3(l|r,e„,|P  +  ||53e,„f )) 

■  (u,,  (||r,!,||=  +  £.115, vlP)  +  c^idlT^r  +  £,||5,i,f ))  . 

Dividing  through  by  the  square  norms  of  the  initial  error  and  u  and  using  Equations  (5)  and 
(G)  this  simplifies  to 

i[e(3).-]p  u?  /u’.dir.vr +£iii5',yr) +.^.(iinyr +£.iig2!/r)'\ 

iicwipiixip  'iin  \w  )' 

It  is  convenient  to  define  two  variables,  t\  and  <2*  such  that 


ir^’.vir 


I|7’2J/1P 


and  rewrite  our  inequality  <'is 


F-*?w'p  S  -''))  +  (“3('>+«>('  -'j))' 

l|C{0)|  U|o|U  V 


'I’he  smoothing  and  approximation  hypotheses  given  by  l‘>|nalion  (<1)  of  the  theorem  can 
then  be  rewritten  in  terms  of  the  mtw  variables  as, 


2 


5  I  +  /hfii 


Figure  2:  Limits  on  u}\ 


We  can  therefore  write  inequality  (7)  in  terms  of  ii  and  1%  with  either  of  these  upper  bounds 

]|Fu||2 

on  . — — ,  so  wc  arc  free  to  use  whichever  is  smaller.  Thus, 

1 1  ^  1 1 1  • 


i|c(o)IPIlHP  - 


1 


((u;i(ii  +  £i(l  --  ty))  4-  {u^2{h  +  £2(1  ~  ^2)))  •  (8) 


1  +  ftyty'  1  +  ^2^2, 

Taking  the  maximum  over  all  values  of  i\  and  <2  we  arrive  at  the  following  bound  on  the 
convergence  rate, 


£'*  < 


max 


mm 


1 


1 


0  <  <1  <  1 
0  <  <2  <  1 


1  +  /?iti  ’  1  + 


+  £l{l  —  ^l))  +  (^2(^2  +  £2(1  "■  ^2))) 


)■ 


Using  the  definitions  of  the  weights  uj\  and  a»2  and  the  conditions  on  the  c,’8,  it  follows  that 

£^<mm(; . -u-,-,  ]•  * 

VI +/?,’ 1  4-/32/ 


Note  that  the  weights  in  the  hypothesis  of  the  theorem,  while  convenient,  are  not  the 
only  choice.  All  wc  really  needed  in  the  proof  was  that  the  weights  lie  within  some  bounds, 
deteruiincd  by  the  /3,’8.  If  we  define  the  ratio,  r;  =  then  the  theorem  will  be  proved 
provided  uy  <  r/  and  W2  >  1  -  r/  whenever  f\  <  1  and  ivi  >  1  —  I/7  and  u>a  <  l/i;  whenever 
7/  >  1.  For  the  statement  of  the  theorem,  we  have  cho.son  u^i  s=  f;/(f;  4- 1).  Sec  Figure  2. 
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Practical  choice  of  the  weiglits 


Suppose  that,  as  in  Mulder,  the  corrections  from  the  two  coarse  grids  are  averaged.  Tlius, 
the  weights  are  given  by 


U) 


m,n  ^ 
t  — 


1 

2’ 


These  weights  give  a  two-grid  convergence  rate  of  approximately  1/2  in  the  case  of  strong 
alignment,  because  the  appropriate  grid  gets  only  half  of  the  needed  information.  In  this 
case,  our  convergence  result  does  not  guarantee  good  convergence.  Thus  we  should  look  for 
ways,  based  on  our  theorem,  to  improve  the  convergence  of  this  method. 

Recall  that  only  semicoarsening  in  the  direction  of  strongest  coupling  can  significantly 
reduce  the  frequencies  which  cannot  be  reduced  by  the  point  relaxation.  The  weights  provide 
a  way  of  “switching”  to  the  appropriate  coarse  grid  in  the  cases  of  strong  alignment  in  one 
of  the  coordinate  directions.  For  our  model  problem,  otUxx  +  yuyy  =  /,  we  have  some  degree 
of  freedom  in  the  choice  of  the  weights.  We  could  take,  for  instance. 


wi  = 


a2-f  72’ 


UJ'i  = 


Qr2  +  7 


2* 


Since  the  appropriate  grid  gets  all  of  the  needed  information  in  the  case  of  strong  alignment, 
these  weights  can  lead  to  convergence  rates  which  can  be  made  arbitrarily  small  by  increasing 
the  number  of  relaxation  sweeps. 

In  general,  ui  and  tuj  will  vary  over  the  domain  and  we  will  not  know  the  relative  strengths 
a  and  7  explicitly.  Suppose  we  know  that,  locally,  all  modes  which  cannot  be  efficiently 
reduced  by  point  relaxation  can  be  well  approximated  on  the  same  semicoarsened  coarse 
grid.  That  is,  suppose  semicoarsening  can  be  used  locally  to  accelerate  the  convergence. 
In  this  case,  we  would  like  to  determine  the  most  efficient  direction  of  semicoarsening'  and 
choose  our  weights  accordingly.  One  way  to  do  this  is  to  test  the  operator,  at  the  given  grid 
point,  on  two  different  high  frequency  Fourier  modes,  one  oscillatory  only  in  the  x-direction, 
the  other  oscillatory  only  in  the  y-direction.  The  two  modes,  call  them  u  and  v,  which  are 
most  natural  look  locally  like: 


1 

-1 

1 

-1 

1  -1 

-1  -1  -1 

-1  -1  -1 

1 

-1 

1 

-1 

1  -1 

1  1  1 

1  1  1 

1 

-1 

1 

-1 

1  -1 

-1  -1  -1 

-1  -1  -1 

1 

-1 

1 

-1 

1  -1 

1  1  1 

1  1  1 

1 

-1 

1 

-1 

1  -1 

-1  -1  -1 

-1  -1  -1 

1 

-1 

1 

-1 

1  -1 

1  1  1 

1  1  1 

u  V 


Appropriate  weights  at  the  grid  point  (t,j)  can  be  determined  by  applying  the  operator, 
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,jm.n  j,  HeHne 

(/t’"’"iO(»,i)- A„(»,;)  and 


'(’hen  a  reasonnbI«;  choice  for  the  weighta  i«: 


w,  (j.j)- 


yi 


K  +  A?; 


m.n 
U^2 


A2  +  Ar 


i'luiH,  if  there  is  a  direction  in  which  sctnicoarscning  can  give  an  acceptable  convergence  rate, 
this'inethod  should  find  that  direction. 


G  Theoretical  Complexity  -  Sequential  and  Parallel 


y\s  shown  in  Mulder  [5],  the  cost  of  a  sequential  MSG  V-cycle  is  proportional  to  the  total 
luunbcr  of  points  on  all  grids.  In  two  dimensions,  where  the  fine  grid  consists  of  M  x  N 
points,  there  is  a  total  of  (2A/  —  1)(2A^  —  1)  grid  points  on  all  of  the  grids  combined,  as  can 
be  easily  seen  by  arranging  all  of  the  grids  as  in  Figure  3.  Thus,  there  are  approximately  four 
times  as  many  points  on  all  the  grids  as  there  are  on  the  finest  grid,  A  similar  arrangement 
of  all  of  the  grids  obtained  by  semicoarsening  a  three  dimensional  L  x  M  x  N  grid  is  also 
shown  in  Figure  3  giving  a  total  of  (2L  —  1)(2A/  —  l)(2Af  —  1)  grid  points  on  all  of  the 
grids,  approximately  8  times  the  number  of  points  on  the  finest  grid.  The  cost  of  the 
d  dimensional  algorithm,  will  be  roughly  proportional  to  2“*  times  the  number  of  points  on 
the  finest  grid,  A  parallel  implementation  of  the  MSG  algorithm  is  relatively  straightforward 
since  the  computational  work  on  a  given  level  is  local  and  can  be  performed  simultaneously 
at  many  grids  points.  For  a  modest  number  of  processors,  most  of  the  computation  time  is 
spent  on  the  fine  grid  levels  since,  on  each  fine  grid  level,  /,  there  are 


/  Ud- 
^  d-\ 


MN 
2'  ^ 


Id-x 

(d-  1)!2^ 


MN 


grid  points  per  level.  On  coarser  levels  this  is  an  upper  bound  on  the  number  of  grid  points. 
Therefore  the  number  of  grid  points  decreases  like  a  polynomial  divided  by  an  exponential 
as  the  levels  become  coarser.  For  a  large  number  of  processors,  approximately  equal  to  the 
number  of  grid  points  on  the  finest  grid  level  A,  an  equal  amount  of  parallel  computation 
time  is  spent  on  all  grid  levels,  resulting  in  a  computational  cost  per  V-cycle  on  the  order  of 
log{N). 

On  message  passing  machines,  the  communication  between  grid  levels  can  become  a 
problem  as  the  number  of  processors  is  increased.  Consider  what  happens  in  the  extreme 
case  where  we  have  as  many  processors  available  as  we  have  grid  points  on  the  finest  grid. 
If  we  Jissign  one  grid  point  to  each  processor,  then  some  processors  have  more  work  on  the 
coarser  levels  and  some  processors  will  have  no  work,  simply  because  the  multiple  semicoarse 
grids  have  some  grid  points  in  common.  Thus,  some  sort  of  re-distribution  must  occur  in 
order  to  keep  the  load  balanced.  We  propose  two  difTcrent  schemes,  a  simple  scheme  involving 
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I'igure  H:  Sucn'sNivf*  wtnjroariM’ning,  tol/il  riiittilMT  of  poiiilM 


transposes,  which  works  only  in  two  dimensions,  and  a  second  scheme  which  offsets  the  grids 
in  order  to  reduce  the  communication.  Both  .schemes  preserve  the  computational  complexity, 
but  have  differing  Communication  requirements.  The  offsetting  scheme  is  ideal  for  hypcrcubc 
communication  networks,  since  all  communication  is  between  nearest  neighboring  processors. 

The  transpose  scheme  is  ba.scd  on  the  observation  that  in  two  dimensions,  even  if  the 
grids  of  the  same  dimensions  are  not  combined,  the  total  number  of  points  on  each  level  does 
not  increase.  For  example,  if  we  start  with  an  8  x  8  grid  on  level  zero,  we  get  a  4  x  8  and  an 
8x4  grid  on  level  one,  still  having  64  mesh  points.  On  level  two,  we  get  a  2  x  8,  two  4x4, 
and  an  8  X  2  grid.  By  carefully  transposing  and  packing  these  grids,  we  can  fit  all  2*  grids 
on  level  /  into  a  8  x  8  array,  as  shown  in  Figure  4.  This  mapping  docs  keep  all  calculations 
on  a  particular  grid  local,  but  involves  packing  and  shifting  between  grid  levels.  “ 

In  higher  dimensions,  this  type  of  packing  of  the  coarse  grids  docs  not  work.  Moreover, 
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I''igure  4:  'IVanfipow*  scheme 


this  scheme  involves  intense  communication  in  the  packing  phase  which  could  make  it'pro- 
hibitively  expensive  on  some  machines.  For  hypercube  communication  networks,  there  are 
alternate  schemes  which  keep  all  of  the  grid  data  local.  One  possibility  is  to  offset  the  various 
coarse  grids  to  redistribute  the  load.  For  example,  in  two  dimensions,  using  iV*  points  and 
(2A^  ~  1)^  processors,  there  is  a  simple  one-to-one  mapping  from  all  of  the  points  on  all  of 
the  grids  to  the  {2N  —  1)^  processors.  For  every  grid  level  /  (0  <  /  <  A:),  and  for  every  grid 
fi"*’"  on  level  /  (m  >  0,  n  >  0,  m  -f  n  =  /),  let  the  (t,  j)th  grid  point  on  grid  n*"’"  be  assigned 
to  the  (t,  j)  processor  where 

t  =  -f- 2’"  -  1 

j  =  2"+V  +  2V-l, 

where  0  <  t  <  A^/2”*  and  0  <  j  <  N/2”*.  The  quantities  2"*  —  1  and  2"  —  1  in  the  above 
expressions  are  the  horizontal  and  vertical  offsets  for  the  grid.  Since  information  is  only 
transferred  to  grids  differing  by  one  in  either  m  or  n,  then  the  relative  offsets  are  always  by 
a  power  of  2  in  one  direction.  This  scheme  works  equally  well  in  three  or  more  dimensions 
and  the  extension  is  obvious. 

Finally,  we  note  that  the  offsetting  of  the  various  grids  can  easily  be  incorporated  into 
the  interpolation  and  restriction  operations.  During  a  restriction  from  fi”**'*  to  for 

example,  an  averaging  of  three  values  of  the  residual  in  the  x  direction  is  immediately  followed 
by  a  shift  of  all  values  to  the  processor  which  is  2"'  to  the  right.  Similarly  for  the  coarsening 
in  the  y  direction.  During  tlie  interpolation,  this  process  is  reversed.  Interpolated  values 
are  shifted  to  the  left.  This  method  automatically  maintains  the  correct  offset  for  all  of  the 
grids  and  increases  the  communication  by  only  a  constant. 

Note  that,  when  relaxations  are  only  performed  on  one  grid  level  at  a  time,  it  Is  sufficient 
to  offset  the  grids  in  only  one  of  the  two  coordinate  directions,  using  only  N{2N  —  1) 
processors. 

7  Experimental  Results 

Experimentally,  the  MSG  algorithm  converges  extremely  well  for  the  model  problem  oUjc*  + 
■yu^v  -  /,  using  the  weights  suggested  in  Section  5.  Asymptotic  convergence  rales  are  given 
in  Table  1.  These  were  obtained  using  a  random  initial  approximation  to  the  solution, 
rescaling  after  each  iteration  and  observing  the  limit  of  the  ratio  of  subsequent  errors  (fj). 
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All  Hvailabic!  coarse  grid  levels  were  iijmhI,  willi  iwo  red/bluck  SOU  relaKation  sweeps 
grid  level  and  an  exact  solve  on  the  coarsest  level,  1’lie  convergence  rates  are  seen  to  l»e 
uniformly  small  for  all  ratios,  0/7. 

MSCi  was  also  used  on  Poisson’s  equation  on  nonoiniformly  stretched  grids.  Chebyshev 
grids  were  used  in  both  directions,  with  convergence  comparable  to  the  uniformly  stretclied 
grids.  Sec  Table  1.  The  convergence  rates  for  grids  which  have  Chebyshev  stretching  in 
only  one  dircctiotJ  are  also  given  and  can  be  seen  to  l>e  in  the  same  range  as  for  the  model 
(uoblem. 

The  last  entry  in  Table  1  is  for  exponential  stretching  of  the  grid  in  one  of  the  coordinate 
directions.  The  exponential  stretching  is  done  so  that  the  ratio  of  the  lengths  of  the  first 
and  the  last  cell  is  10,000.  The  convergence  rates  are  slightly  worse,  but  still  appear  to  W 
bounded  independently  of  grid  size. 


8x8 

16  X  16 

_ 

32  x32 

64  x64 

_ 

Uniform  Grid 
ah  =  \ 

0.07 

■ 

0 

II 

c- 

0.13 

■oHM 

189 

ah  =  100 

0.16 

0.19 

OnHW 

0 

0 

0 

II 

c- 

0.16 

0.19 

la 

0.21 

Chebyschev  Grid 

0.14 

0.15 

0.16 

0.16 

Uniform/Chebyshev 

0.09 

0.13 

0.15 

0.16 

Exponential  Stretching 

0.15 

0.18 

0.19 

0.20 

Table  1:  Asymptotic  convergence  rates  of  MSG  on  various  types  of  grids 


On  massively  parallel  architectures,  the  relaxation  sweeps  in  the  MSG  algorithm  can  be 
performed  concurrently  on  all  grids  on  all  grid  levels  using  the  CMG  algorithm  of  Gannon 
and  Van  Roscndale  [2].  The  combined  CMG/MSG  algorithm  proceeds  in  two  phases.  In  the 
first  phase,  the  relaxation  is  performed  on  all  grids,  on  all  levels.  The  second  phase  is  the 
intergrid  transfer  phase,  in  which  residuals  and  corrections  from  each  grid  are  transferred  to 
neighboring  coarse  and  fine  grids,  respectively.  Experimental  results  indicate  that  the  ro¬ 
bustness  properties  of  the  MSG  algorithm  are  retained.  In  Table  2,  the  observed  convergence 
rates  arc  given  for  the  model  problem.  Note  that  we  again  observe  that  strong  alignment 
does  not  seriously  degrade  the  convergence.  The  convergence  rates  per  concurrent  iteration 
arc  mostly  in  tlie  0.4-0, 6  range. 
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8x8 

16  X  16 

32  X  32 

64  x64 

Uniform  Grid 

0/7  ic  1 

0.42 

0.52 

0.69 

0.60 

or/7  =  10 

0.46 

0.48 

0.59 

0.61 

0/7=  100 

0.40 

0.43 

0.52 

0.57 

0/7=  1000 

0.41 

0.43 

0.51 

0.55 

Chebyschev  Grid 

0.37 

0.44 

0..50 

0.53 

Uniform/Chebyshev 

0.44 

0.53 

0.56 

0.57 

Exponential  Stretching 

0.48 

0.59 

0.63 

0.63 

Table  2:  Observed  convergence  rates  of  MSG/CMG  on  various  types  of  grids 
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le^Abttrtct 


Multigrid  convergence  rates  degenerate  on  problems  with  stretched  grids  or  anisotropic  operators,  unless  one 
uses  line  or  plane  relaxation.  For  three  dimensional  problems,  only  plane  relaxation  suffices,  in  general.  While 
line  and  plane  relaxation  algorithms  are  efficient  on  sequential  machines,  they  are  quite  awkward  and  inefficient  on 
parallel  machines.  This  paper  presents  a  new  muitigrid  algorithm,  based  on  the  use  of  multiple  coarse  grids,  that 
eliminates  the  need  for  line  or  plane  relaxation  in  anisotropic  problems.  We  develop  this  algorithm,  and  extend  the 
standard  multigrid  theory  to  establish  rapid  convergence  for  this  class  of  algorithms.  The  new  algorithm  uses  only 
point  relaxation,  allowing  easy  and  efficient  parallel  implementation,  yet  achieves  robustness  and  convergence  rates 
comparable  to  line  and  plane  relaxation  muitigrid  algorithms. 

The  algorithm  described  here  is  a  variant  of  Mulder’s  multigrid  algorithm  [5]  for  hyperbolic  problenM.  The  latter 
uses  multiple  coarse  grids  to  achieve  robustness,  but  is  unsuitable  for  elliptic  problems,  since  its  V>cycle  convergence 
rate  goes  to  one  os  the  number  of  levels  increases.  The  new  algorithm  combines  the  contributions  from  the  multiple 
coarse  g'ids  via  a  local  “switch,”  based  on  the  strength  of  the  diKrete  operator  in  each  coordinate  direction.  Thb 
improvement  allows  us  to  show  that  the  V-cycle  convergence  rate  is  uniformly  bounded  away  from  one,  on  model 
anisotropic  problems.  Moreover,  the  new  algorithm  can  be  combined  with  the  idea  of  concurrent  iteration  on  all 
muitigrid  levels  to  yield  a  highly  parallel  algorithm  for  strongly  anisotropic  problems. 
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